Updated on: 2022-07-28
Through this project, my intention is to apply what I have learnt in R onto one of the areas that I always had an interest in, which is stock portfolio optimization. It documents the commonly used packages and functions in R that are useful for portfolio analysis and possibly even for back-testing. It is not intended to be a detailed or advanced guide to portfolio theory and R application.
The first part starts off with understanding some basic functions in the relevant packages, which would include importing stock prices from Yahoo Finance to calculating returns and important metrics, such as Beta and Sharpe Ratio. The second part would involve tools that we can use for single-period optimization and multi-period optimization with back-testing of portfolio optimization strategy.
#Use install.packages() or go to the Packages panel to install packages if they have not been installed
library(dplyr)
library(PerformanceAnalytics) # For portfolio performance and risk analysis
library(PortfolioAnalytics) # For portfolio optimization and analysis
library(quantmod) # For obtaining historical prices from Yahoo Finance
library(ROI) # For ROI solver in portfolio optimization
library(ROI.plugin.glpk) # Part of the ROI solver for linear optimization
library(ROI.plugin.quadprog) #Part of the ROI solver for quadratic optimization
The getSymbols
function allows us to
retrieve historical price data from sources such as Yahoo Finance and
FRED. For this project, since we are dealing with portfolio
optimization, I have retrieved the stock price data of Amazon (AMZN)
from Yahoo Finance. Data can be retrieved for any stock that has a
ticker listed in Yahoo Finance.
<- getSymbols(Symbols = "AMZN",
AMZN src = "yahoo",
# Can instead choose "weekly" or "monthly" to import weekly or monthly price data
periodicity = "daily",
auto.assign = F)
Let us get some details about the object AMZN
.
head(AMZN, n = 4); tail(AMZN, n = 4)
## AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume AMZN.Adjusted
## 2007-01-03 1.9340 1.9530 1.9025 1.9350 248102000 1.9350
## 2007-01-04 1.9295 1.9570 1.9130 1.9450 126368000 1.9450
## 2007-01-05 1.9360 1.9395 1.8800 1.9185 132394000 1.9185
## 2007-01-08 1.9110 1.9155 1.8585 1.8750 135660000 1.8750
## AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume AMZN.Adjusted
## 2022-07-22 125.01 125.50 121.35 122.42 51402700 122.42
## 2022-07-25 122.70 123.64 120.03 121.14 50221300 121.14
## 2022-07-26 115.79 118.15 114.53 114.81 67075100 114.81
## 2022-07-27 117.31 121.90 117.16 120.97 61480400 120.97
colSums(is.na(AMZN))
## AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume
## 0 0 0 0 0
## AMZN.Adjusted
## 0
AMZN
contains the Opening Prices, High and Low Prices,
Closing Prices, Volume and Adjusted Closing Prices of each day. It
ranges from 03 January 2007 (the earliest date for which data can be
imported) to 27 May 2022 (the most recent date when this was written).
The colSums
function can be used to find
if there are any NA
rows, which represents missing data, in
the columns.
The Adjusted Price column will be most useful for our analysis in
general as it has been adjusted for splits and dividends. I have subset
that column into a new object called AMZN.Adj
, calculate
the discrete return of the stock and plot its cumulative returns over
time.
<- AMZN[,6]
AMZN.Adj
# na.omit() removes the first row, which is an NA value as there are no prior data to calculate returns
<- na.omit(Return.calculate(AMZN.Adj, method = "discrete"))
AMZN_Returns # We could use Return.calculate(AMZN.Adj, method = "log") to calculate the log return instead.
chart.CumReturns(AMZN_Returns,
legend.loc = "topleft",
# Could specify geometric = F to use simple/arithmetic return when using log returns in the previous step
geometric = T,
main = "Cumulative Daily Return of Amazon")
To make interpretation easier, I have opted to use discrete returns
in the Return.calculate
function and
geometric chaining in the chart.CumReturns
function. For continuous compounding, log returns can be used instead,
which can be added across time and simple/arithmetic chaining should be
chosen instead of geometric chaining. However, log returns cannot be
added across securities in a portfolio, which is another reason for
choosing discrete returns.
Now that we have an understanding of how to import stock prices and calculate returns, we can start to create a portfolio with more stocks. Firstly, we can create a vector of the stock tickers that we are interested in analyzing and adding to our portfolio.
<- c("JNJ", "PG", "AAPL", "TSM", "MSFT", "NVDA") tickers
I have created an empty object called price_data
which
will be used to aggregate the stock price data we will be importing. I
had also indicated a start and end date for the period which data will
be collected. We can also subset the Adjusted Closing Price column when
we import the price data to simplify the work.
<- NULL
price_data
<- "2012-01-01"
startdate <- "2022-05-23"
enddate
for (ticker in tickers) {
<- cbind(price_data,
price_data getSymbols(ticker,
src = "yahoo",
from = startdate, to = enddate,
periodicity = "daily", auto.assign = F)[,6])
}
colnames(price_data) <- tickers
Let us see some details about the object price_data
.
head(price_data, n = 4); tail(price_data, n = 4)
## JNJ PG AAPL TSM MSFT NVDA
## 2012-01-03 49.02983 48.56105 12.55747 9.633207 21.57289 3.223092
## 2012-01-04 48.73216 48.53924 12.62495 9.546028 22.08058 3.259823
## 2012-01-05 48.67261 48.33577 12.76511 9.633207 22.30622 3.376900
## 2012-01-06 48.24839 48.21952 12.89856 9.553293 22.65274 3.337875
## JNJ PG AAPL TSM MSFT NVDA
## 2022-05-17 177.6783 153.6823 149.24 92.83105 266.20 181.7316
## 2022-05-18 174.3795 144.1045 140.82 90.05566 254.08 169.3442
## 2022-05-19 172.8294 140.7860 137.35 89.73734 253.14 171.2038
## 2022-05-20 175.8500 140.8754 137.59 90.30435 252.56 166.9047
nrow(price_data); colSums(is.na(price_data))
## [1] 2614
## JNJ PG AAPL TSM MSFT NVDA
## 0 0 0 0 0 0
The nrow
function returns us the number
of rows, representing the number of data points of each stock, and
colSums
shows us that there are no missing
data in any of the columns.
Based on most textbooks on portfolio theory, the calculation of portfolio return is: \[ R_p = \sum_{i=1}^n w_iR_i \] where \(R_i\) is the return of security \(i\).
Therefore, we need to calculate the discrete returns of the securities.
<- na.omit(Return.calculate(price_data, method = "discrete")) returns
Subsequently, we can use the
Return.portfolio
function and geometric
chaining to calculate the portfolio’s daily returns. There is an option
to specify a vector of desired weights for the individual securities and
leaving it out uses an equal weight as the default.
To view the object r_noRebal
we have to use the
lapply
function as the object is a list.
Due to the size of the output, I have indicated to show only the first
four rows of each element in the list.
<- Return.portfolio(returns,
r_noRebal geometric = T,
verbose = T)
colnames(r_noRebal$returns) <- "Rp_noRebal"
lapply(r_noRebal, head, n = 4)
## $returns
## Rp_noRebal
## 2012-01-04 0.0041223277
## 2012-01-05 0.0102229941
## 2012-01-06 -0.0007821064
## 2012-01-09 0.0012500645
##
## $contribution
## JNJ PG AAPL TSM MSFT
## 2012-01-04 -0.0010118738 -7.483707e-05 0.0008956558 -0.001508307 0.003922323
## 2012-01-05 -0.0002015900 -6.954705e-04 0.0018526637 0.001502115 0.001736058
## 2012-01-06 -0.0014215841 -3.933200e-04 0.0017459907 -0.001363003 0.002639178
## 2012-01-09 0.0002495992 6.889309e-04 -0.0002679227 0.002852148 -0.002272691
## NVDA
## 2012-01-04 0.001899367
## 2012-01-05 0.006029218
## 2012-01-06 -0.001989368
## 2012-01-09 0.000000000
##
## $BOP.Weight
## JNJ PG AAPL TSM MSFT NVDA
## 2012-01-04 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## 2012-01-05 0.1649747 0.1659079 0.1668744 0.1644803 0.1698887 0.1678740
## 2012-01-06 0.1631057 0.1635406 0.1670196 0.1643028 0.1698879 0.1721434
## 2012-01-09 0.1618107 0.1632749 0.1688977 0.1630673 0.1726622 0.1702872
##
## $EOP.Weight
## JNJ PG AAPL TSM MSFT NVDA
## 2012-01-04 0.1649747 0.1659079 0.1668744 0.1644803 0.1698887 0.1678740
## 2012-01-05 0.1631057 0.1635406 0.1670196 0.1643028 0.1698879 0.1721434
## 2012-01-06 0.1618107 0.1632749 0.1688977 0.1630673 0.1726622 0.1702872
## 2012-01-09 0.1618579 0.1637592 0.1684193 0.1657123 0.1701767 0.1700746
##
## $BOP.Value
## JNJ PG AAPL TSM MSFT NVDA
## 2012-01-04 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## 2012-01-05 0.1656548 0.1665918 0.1675623 0.1651584 0.1705890 0.1685660
## 2012-01-06 0.1654524 0.1658935 0.1694226 0.1666667 0.1723322 0.1746201
## 2012-01-09 0.1640103 0.1654945 0.1711937 0.1652841 0.1750094 0.1726021
##
## $EOP.Value
## JNJ PG AAPL TSM MSFT NVDA
## 2012-01-04 0.1656548 0.1665918 0.1675623 0.1651584 0.1705890 0.1685660
## 2012-01-05 0.1654524 0.1658935 0.1694226 0.1666667 0.1723322 0.1746201
## 2012-01-06 0.1640103 0.1654945 0.1711937 0.1652841 0.1750094 0.1726021
## 2012-01-09 0.1642633 0.1661928 0.1709222 0.1681750 0.1727058 0.1726021
An awesome feature of this function is that we can input re-balancing
periods, which can be monthly, quarterly, or yearly. If
rebalance_on
was left out, the default
assumes a buy-and-hold strategy (no re-balancing), and you may
experiment with this based on your investing preferences.
<- Return.portfolio(returns,
r_withRebal geometric = T,
rebalance_on = "quarters",
verbose = T)
colnames(r_withRebal$returns) <- "Rp_withRebal"
Indicating verbose = T
in
Return.portfolio
allows us to extract more
details about the underlying portfolio return calculation, such as the
weights at the beginning and end of periods. We take a look at how the
end-of-period weight for JNJ
changes when the portfolio is
re-balanced versus when it is not re-balanced.
<- r_noRebal$EOP.Weight
eop_weight_noRebal
<- r_withRebal$EOP.Weight
eop_weight_wRebal
par(mfrow = c(2,1), mar = c(2, 4, 2, 2))
plot.zoo(eop_weight_wRebal$JNJ,
main = "JNJ End-of-Period Weights Over Time",
ylab = "With Rebalancing")
abline(h = 1/length(colnames(eop_weight_wRebal)), col = "red", lwd = 2)
plot.zoo(eop_weight_noRebal$JNJ,
ylab = "Without Rebalancing")
abline(h = 1/length(colnames(eop_weight_wRebal)), col = "red", lwd = 2)
When we re-balance the portfolio quarterly, the end-of-period weights
tend to fluctuate around the weight of approximately 17% (which is the
weight in an equal weight portfolio). In contrast, the weight without
re-balancing showed a decreasing trend. We can find out why this
happened by plotting the end-of period weights of all stocks in the
portfolio. Weight of NVDA
had increased drastically over
time as the stock’s value grew faster in that span of time compared to
other stocks.
par(mfrow = c(1,1), mar = c(2, 2, 2, 2))
plot.zoo(eop_weight_noRebal,
main = "End-of-Period Weights Over Time Without Rebalancing")
We can use the
charts.PerformanceSummary
function to
compare the cumulative return of the portfolios with re-balancing and
without re-balancing.
<- cbind(r_noRebal$returns, r_withRebal$returns)
ret_comp
charts.PerformanceSummary(R = ret_comp,
main = "Comparison of Cumulative Returns",
legend.loc = "topleft")
We can compare the statistics of this two portfolios using
table.stats
function. We can see that
re-balancing reduced standard deviation but has also slightly reduced
returns (values are based on daily returns).
table.Stats(ret_comp)
We may also wish to compare our portfolios against a benchmark that is relevant to the portfolio we created. In this example, I have added large-cap U.S. stocks into my portfolio and so my benchmark would be the S&P500 index (ticker: SPY)
<- getSymbols("SPY",
benchmark src = "yahoo",
from = startdate, to = enddate,
periodicity = "daily", auto.assign = F)[,6]
nrow(benchmark)
## [1] 2614
colSums(is.na(benchmark))
## SPY.Adjusted
## 0
<- na.omit(Return.calculate(benchmark, method = "discrete")) %>%
benchmarkReturn `colnames<-`("SPY")
With the benchmark return, we can calculate the portfolios’ betas and
the alphas using the functions CAPM.beta
and CAPM.jensenAlpha
. These functions have
an optional input for the risk-free rate, which I had assumed to be 3%
and since we are dealing with daily returns, we need to divide by 252
trading days. Other metrics that might be of interest are the Sharpe,
Information, and Treynor Ratios.
<- round(rbind(CAPM.beta(ret_comp, benchmarkReturn, Rf = 0.03/252),
perfMetrics CAPM.alpha(ret_comp, benchmarkReturn, Rf = 0.03/252),
SharpeRatio(ret_comp, Rf = 0.03/252, FUN = "StdDev"),
TreynorRatio(ret_comp, benchmarkReturn, Rf = 0.03/252),
InformationRatio(ret_comp, benchmarkReturn)), digits = 4
)
perfMetrics
## Rp_noRebal Rp_withRebal
## Beta: SPY 1.2705 1.0314
## Alpha: SPY 0.0005 0.0004
## StdDev Sharpe (Rf=0%, p=95%): 0.0627 0.0723
## Treynor Ratio: SPY 0.2051 0.2214
## Information Ratio: SPY 0.9366 1.3389
The table.AnnualizeReturns
function can
be used to find the annualized return, standard deviation, and Sharpe
ratio.
cbind(table.AnnualizedReturns(ret_comp, Rf = 0.03/252),
table.AnnualizedReturns(benchmarkReturn, Rf = 0.03/252))
For those who are interested in the breakdown of returns by months in
each year, the function
table.CalendarReturns
can be used.
table.CalendarReturns(benchmarkReturn)
In Section 4, I had assumed that weights were given (using the default option to calculate the returns of an equal weight portfolio). In this part, we are going to optimize the weight of the stocks in our portfolio based on certain objectives.
Before we begin with the optimization, we need to create a list of
constraints and objectives for our portfolio specification. To do so, we
need to use the portfolio.spec
function
before adding constrains or objectives using
add.constraint
or
add.objective
.
<- portfolio.spec(colnames(returns))
portspec1
# Sum of weights constrained to 1, can also specify as type = "full investment"
<- add.constraint(portspec1,
portspec1 type = "weight_sum",
min_sum = 1, max_sum = 1)
# Weight constraint on each stock, max is 35% of portfolio
<- add.constraint(portspec1,
portspec1 type="box",
min = 0, max = 0.35)
# Objective to minimize risk based on variance (function will default to standard deviation as measure of risk)
<- add.objective(portspec1,
portspec1 type = "risk",
name = "var")
After indicating our desired constraints and objectives, we can
proceed to use the function
optimize.portfolio
to calculate the
optimal weights that minimizes variance.
optimize.portfolio
uses different types of
solvers to calculate the optimal weight, and since minimizing variance
is a quadratic optimization problem, we use the
quadprog
solver from the
ROI.plugin.quadprog
. We can instead
indicate ROI
which automatically chooses
between glpk
or
quadprog
solvers based on whether it is a
linear or quadratic optimization problem.
<- optimize.portfolio(returns,
port_MV
portspec1,optimize_method = "quadprog",
# Indicating "trace = T" allows us to use
# additional information in the later parts
trace = T)
port_MV
## ***********************************
## PortfolioAnalytics Optimization
## ***********************************
##
## Call:
## optimize.portfolio(R = returns, portfolio = portspec1, optimize_method = "quadprog",
## trace = T)
##
## Optimal Weights:
## JNJ PG AAPL TSM MSFT NVDA
## 0.3500 0.3500 0.0773 0.1426 0.0800 0.0000
##
## Objective Measure:
## StdDev
## 0.009587
This time, I am optimizing for the other extreme case, which
maximizes returns of the portfolio. The steps are similar to those used
in Section 5.1, except for a change in the objective and the type of
solver used. Since maximizing return is a linear optimization problem,
we can use the glpk
solver, or simply
indicate ROI
.
<- portfolio.spec(colnames(returns))
portspec2
# We use the same constraints as in Case 1
<- add.constraint(portspec2,
portspec2 type = "weight_sum",
min_sum = 1, max_sum = 1)
<- add.constraint(portspec2,
portspec2 type="box",
min=0, max=0.35)
# Objective to maximize return based on mean return
<- add.objective(portspec2,
portspec2 type = "return",
name = "mean")
<- optimize.portfolio(returns,
port_MR
portspec2,optimize_method = "glpk",
trace = T)
port_MR
## ***********************************
## PortfolioAnalytics Optimization
## ***********************************
##
## Call:
## optimize.portfolio(R = returns, portfolio = portspec2, optimize_method = "glpk",
## trace = T)
##
## Optimal Weights:
## JNJ PG AAPL TSM MSFT NVDA
## 0.00 0.00 0.35 0.00 0.30 0.35
##
## Objective Measure:
## mean
## 0.001352
To compare the portfolios, we need to extract the optimal weights that had been calculated in the previous steps and calculate the portfolio returns. I have created a bar chart to show how the optimal weights differ in each portfolio.
<- extractWeights(port_MV)
weight_MV
<- extractWeights(port_MR)
weight_MR
par(mfrow = c(2,1), mar = c(2, 4, 2, 2))
barplot(weight_MV,
main = "Weights in Different Portfolios",
ylab = "Portfolio MV")
barplot(weight_MR,
ylab = "Portfolio MR")
With the weights, we can calculate the daily returns of the different
portfolio using the Return.portfolio
function. I have included the re-balancing periodicity, but you may
choose to leave it out or change to months or years depending on your
preference.
<- Return.portfolio(returns,
rp_MV weights = weight_MV,
rebalance_on = "quarters",
geometric = T) %>%
`colnames<-`("Portfolio MV")
<- Return.portfolio(returns,
rp_MR weights = weight_MR,
rebalance_on = "quarters",
geometric = T) %>%
`colnames<-`("Portfolio MR")
With the portfolio returns, we can compare the how the portfolios performed against the market benchmark. I have included a chart of cumulative returns and some statistics and metrics as I had done in Section 4.
<- cbind(rp_MV, rp_MR, benchmarkReturn)
comparison
charts.PerformanceSummary(comparison,
main = "Comparing Performance of Portfolios",
legend.loc = "topleft")
table.Stats(comparison)
table.AnnualizedReturns(comparison, 0.03/252, scale = 252)
Let us create a new portfolio specification for back-testing that follows the mean-variance portfolio theory.
<- portfolio.spec(colnames(returns))
portspec3
# Slight change in weight_sum to reduce restrictiveness while optimizing
<- add.constraint(portspec3,
portspec3 type = "weight_sum",
min_sum = 0.99, max_sum = 1.01)
<- add.constraint(portspec3,
portspec3 type="box",
min = 0, max = 0.4)
# Add a constraint to have a target return
<- add.constraint(portspec3,
portspec3 type = "return",
return_target = 0.15/252)
# Adding a return objective, thus we maximize mean return per unit of risk
<- add.objective(portspec3,
portspec3 type = "return",
name = "mean")
<- add.objective(portspec3,
portspec3 type = "risk",
name = "var")
The PortfolioAnalytics
package provides
a handy function called
optimize.portfolio.rebalancing
which can
provide back-testing capabilities. It is similar to the function
optimize.portfolio
but has requires other
inputs. To do back-testing, we need to input the training period for
when the portfolio will be optimized, and the rolling window which rolls
over historical data. For this part, I will be using the random
portfolios solver and in order to do so, I will first generate a matrix
of random portfolios. This will then be fed to the optimization function
to prevent recalculation. A larger number of random portfolios tend to
lead to better optimization results but will also require more time for
calculation.
set.seed(123)
<- random_portfolios(portfolio = portspec3,
randport permutations = 20000,
rp_method = "sample",
eliminate = T)
<- optimize.portfolio.rebalancing(returns,
port_MeanVar_qRebal
portspec3,optimize_method = "random",
rebalance_on = "quarters",
rp = randport,
maxSR = T, # maximize Sharpe Ratio
search_size = 5000,
training_period = 504,
rolling_window = 45)
I have generated the randport
first so that the
optimize.portfolio.rebalancing
function
does not recalculate these portfolios. Since our data is in daily
periods, I have used 504 days as the optimization period for the
portfolio for which weights are calculated, and this historical data
rolls forward by 45 days each time. We can take a look at how the
weights changed over time by using the
chart.Weights
function.
chart.Weights(port_MeanVar_qRebal)
Using the function Return.portfolio
, we
calculate the returns of the portfolio. Since the optimization included
quarterly re-balancing, there is no need to indicate it in this
function.
<- extractWeights(port_MeanVar_qRebal)
w_qOptim
<- Return.portfolio(returns,
rp_MeanVar weights = w_qOptim,
geometric = T)
Lastly, we can visualize the performance of this portfolio by plotting its cumulative returns and compare it against the market benchmark.
<- cbind(rp_MeanVar,benchmarkReturn)
evaluation
charts.PerformanceSummary(evaluation, Rf=0.03/252, main = "Performance of Mean-Variance Portfolio")
table.AnnualizedReturns(evaluation, Rf=0.03/252)
This project serves to apply what I have learnt in R to portfolio optimization. It documents the functions and packages that are used and how to create interesting visualizations of portfolio performance. While optimization and back-testing is a useful and important concept, it is critical to note that past performance of stocks should not be used to indicate future performance. It only serves to test how certain investment strategies performed based on historical data and whether it would suit an investor’s preference for risk and return.